436 research outputs found

    Efficient Frequent Subtree Mining Beyond Forests

    Get PDF
    A common paradigm in distance-based learning is to embed the instance space into some appropriately chosen feature space equipped with a metric and to define the dissimilarity between instances by the distance of their images in the feature space. If the instances are graphs, then frequent connected subgraphs are a well-suited pattern language to define such feature spaces. Identifying the set of frequent connected subgraphs and subsequently computing embeddings for graph instances, however, is computationally intractable. As a result, existing frequent subgraph mining algorithms either restrict the structural complexity of the instance graphs or require exponential delay between the output of subsequent patterns. Hence distance-based learners lack an efficient way to operate on arbitrary graph data. To resolve this problem, in this thesis we present a mining system that gives up the demand on the completeness of the pattern set to instead guarantee a polynomial delay between subsequent patterns. Complementing this, we devise efficient methods to compute the embedding of arbitrary graphs into the Hamming space spanned by our pattern set. As a result, we present a system that allows to efficiently apply distance-based learning methods to arbitrary graph databases. To overcome the computational intractability of the mining step, we consider only frequent subtrees for arbitrary graph databases. This restriction alone, however, does not suffice to make the problem tractable. We reduce the mining problem from arbitrary graphs to forests by replacing each graph by a polynomially sized forest obtained from a random sample of its spanning trees. This results in an incomplete mining algorithm. However, we prove that the probability of missing a frequent subtree pattern is low. We show empirically that this is true in practice even for very small sized forests. As a result, our algorithm is able to mine frequent subtrees in a range of graph databases where state-of-the-art exact frequent subgraph mining systems fail to produce patterns in reasonable time or even at all. Furthermore, the predictive performance of our patterns is comparable to that of exact frequent connected subgraphs, where available. The above method considers polynomially many spanning trees for the forest, while many graphs have exponentially many spanning trees. The number of patterns found by our mining algorithm can be negatively influenced by this exponential gap. We hence propose a method that can (implicitly) consider forests of exponential size, while remaining computationally tractable. This results in a higher recall for our incomplete mining algorithm. Furthermore, the methods extend the known positive results on the tractability of exact frequent subtree mining to a novel class of transaction graphs. We conjecture that the next natural extension of our results to a larger transaction graph class is at least as difficult as proving whether P = NP, or not. Regarding the graph embedding step, we apply a similar strategy as in the mining step. We represent a novel graph by a forest of its spanning trees and decide whether the frequent trees from the mining step are subgraph isomorphic to this forest. As a result, the embedding computation has one-sided error with respect to the exact subgraph isomorphism test but is computationally tractable. Furthermore, we show that we can leverage a partial order on the pattern set. This structure can be used to reduce the runtime of the embedding computation dramatically. For the special case of Jaccard-similarity between graph embeddings, a further substantial reduction of runtime can be achieved using min-hashing. The Jaccard-distance can be approximated using small sketch vectors that can be computed fast, again using the partial order on the tree patterns

    ALIGNMENT OF BUSINESS PROCESS MANAGEMENT AND BUSINESS RULES

    Get PDF
    Business process management and business rules management both focus on controlling business activities in organizations. Although both management principles have the same focus, they approach manageability and controllability from different perspectives. As more organizations deploy business process management and business rules management, this paper argues that these often separated efforts should be integrated. The goal of this work is to present a step towards this integration. We propose a business rule categorization that is aligned to the business process management lifecycle. In a case study and through a survey the proposed rule categories are validated in terms of mutual exclusivity and completeness. The results indicate the completeness of our main categorization and the categories’ mutual exclusivity. Future research should indicate further refinement by identifying rule subcategories

    Expectation-Complete Graph Representations with Homomorphisms

    Full text link
    We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary functions on graphs, we are interested in efficient alternatives that become arbitrarily expressive with increasing resources. Our approach is based on Lov\'asz' characterisation of graph isomorphism through an infinite dimensional vector of homomorphism counts. Our empirical evaluation shows competitive results on several benchmark graph learning tasks.Comment: accepted for publication at ICML 202

    Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages

    Full text link
    We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.Comment: 40 pages, 13 figures, 5 table

    SUSAN: The Structural Similarity Random Walk Kernel

    Get PDF
    Random walk kernels are a very flexible family of graph kernels, in which we can incorporate edge and vertex similarities through positive definite kernels. In this work we study the particular case within this family in which the vertex kernel has bounded support. We motivate this property as the configurable flexibility in terms of vertex alignment between the two graphs on which the walk is performed. We study several fast and intuitive ways to derive structurally aware labels and combine them with such a vertex kernel, which in turn is incorporated in the random walk kernel. We provide a fast algorithm to compute the resulting random walk kernel and we give precise bounds on its computational complexity. We show that this complexity always remains upper bounded by that of alternative methods in the literature and study conditions under which this advantage can be significantly higher. We evaluate the resulting configurations on their predictive performance on several families of graphs and show significant improvements against the vanilla random walk kernel and other competing algorithms. Read More: https://epubs.siam.org/doi/abs/10.1137/1.9781611976700.3

    Differential cross section measurements for the production of a W boson in association with jets in proton–proton collisions at √s = 7 TeV

    Get PDF
    Measurements are reported of differential cross sections for the production of a W boson, which decays into a muon and a neutrino, in association with jets, as a function of several variables, including the transverse momenta (pT) and pseudorapidities of the four leading jets, the scalar sum of jet transverse momenta (HT), and the difference in azimuthal angle between the directions of each jet and the muon. The data sample of pp collisions at a centre-of-mass energy of 7 TeV was collected with the CMS detector at the LHC and corresponds to an integrated luminosity of 5.0 fb[superscript −1]. The measured cross sections are compared to predictions from Monte Carlo generators, MadGraph + pythia and sherpa, and to next-to-leading-order calculations from BlackHat + sherpa. The differential cross sections are found to be in agreement with the predictions, apart from the pT distributions of the leading jets at high pT values, the distributions of the HT at high-HT and low jet multiplicity, and the distribution of the difference in azimuthal angle between the leading jet and the muon at low values.United States. Dept. of EnergyNational Science Foundation (U.S.)Alfred P. Sloan Foundatio

    Optimasi Portofolio Resiko Menggunakan Model Markowitz MVO Dikaitkan dengan Keterbatasan Manusia dalam Memprediksi Masa Depan dalam Perspektif Al-Qur`an

    Full text link
    Risk portfolio on modern finance has become increasingly technical, requiring the use of sophisticated mathematical tools in both research and practice. Since companies cannot insure themselves completely against risk, as human incompetence in predicting the future precisely that written in Al-Quran surah Luqman verse 34, they have to manage it to yield an optimal portfolio. The objective here is to minimize the variance among all portfolios, or alternatively, to maximize expected return among all portfolios that has at least a certain expected return. Furthermore, this study focuses on optimizing risk portfolio so called Markowitz MVO (Mean-Variance Optimization). Some theoretical frameworks for analysis are arithmetic mean, geometric mean, variance, covariance, linear programming, and quadratic programming. Moreover, finding a minimum variance portfolio produces a convex quadratic programming, that is minimizing the objective function ðð¥with constraintsð ð 𥠥 ðandð´ð¥ = ð. The outcome of this research is the solution of optimal risk portofolio in some investments that could be finished smoothly using MATLAB R2007b software together with its graphic analysis

    Impacts of the Tropical Pacific/Indian Oceans on the Seasonal Cycle of the West African Monsoon

    Get PDF
    The current consensus is that drought has developed in the Sahel during the second half of the twentieth century as a result of remote effects of oceanic anomalies amplified by local land–atmosphere interactions. This paper focuses on the impacts of oceanic anomalies upon West African climate and specifically aims to identify those from SST anomalies in the Pacific/Indian Oceans during spring and summer seasons, when they were significant. Idealized sensitivity experiments are performed with four atmospheric general circulation models (AGCMs). The prescribed SST patterns used in the AGCMs are based on the leading mode of covariability between SST anomalies over the Pacific/Indian Oceans and summer rainfall over West Africa. The results show that such oceanic anomalies in the Pacific/Indian Ocean lead to a northward shift of an anomalous dry belt from the Gulf of Guinea to the Sahel as the season advances. In the Sahel, the magnitude of rainfall anomalies is comparable to that obtained by other authors using SST anomalies confined to the proximity of the Atlantic Ocean. The mechanism connecting the Pacific/Indian SST anomalies with West African rainfall has a strong seasonal cycle. In spring (May and June), anomalous subsidence develops over both the Maritime Continent and the equatorial Atlantic in response to the enhanced equatorial heating. Precipitation increases over continental West Africa in association with stronger zonal convergence of moisture. In addition, precipitation decreases over the Gulf of Guinea. During the monsoon peak (July and August), the SST anomalies move westward over the equatorial Pacific and the two regions where subsidence occurred earlier in the seasons merge over West Africa. The monsoon weakens and rainfall decreases over the Sahel, especially in August.Peer reviewe

    Search for heavy resonances decaying to two Higgs bosons in final states containing four b quarks

    Get PDF
    A search is presented for narrow heavy resonances X decaying into pairs of Higgs bosons (H) in proton-proton collisions collected by the CMS experiment at the LHC at root s = 8 TeV. The data correspond to an integrated luminosity of 19.7 fb(-1). The search considers HH resonances with masses between 1 and 3 TeV, having final states of two b quark pairs. Each Higgs boson is produced with large momentum, and the hadronization products of the pair of b quarks can usually be reconstructed as single large jets. The background from multijet and t (t) over bar events is significantly reduced by applying requirements related to the flavor of the jet, its mass, and its substructure. The signal would be identified as a peak on top of the dijet invariant mass spectrum of the remaining background events. No evidence is observed for such a signal. Upper limits obtained at 95 confidence level for the product of the production cross section and branching fraction sigma(gg -> X) B(X -> HH -> b (b) over barb (b) over bar) range from 10 to 1.5 fb for the mass of X from 1.15 to 2.0 TeV, significantly extending previous searches. For a warped extra dimension theory with amass scale Lambda(R) = 1 TeV, the data exclude radion scalar masses between 1.15 and 1.55 TeV
    corecore